Classifying Korean comparative sentences for comparison analysis

نویسندگان

  • Seon Yang
  • Youngjoong Ko
چکیده

Comparisons sort objects based on their superiority or inferiority and they may have major effects on a variety of evaluation processes. The Web facilitates qualitative and quantitative comparisons via online debates, discussion forums, product comparison sites, etc., and comparison analysis is becoming increasingly useful in many application areas. This study develops a method for classifying sentences in Korean text documents into several different comparative types to facilitate their analysis. We divide our study into two tasks: 1) extracting comparative sentences from text documents, and 2) classifying comparative sentences into seven types. In the first task, we investigate many actual comparative sentences by referring to previous studies and construct a lexicon of comparisons. Sentences that contain elements from the lexicon are regarded as comparative sentence candidates. Next, we use machine learning techniques to eliminate non-comparative sentences from the candidates. In the second task, we roughly classify the comparative sentences using keywords and use a transformation-based learning method to correct initial classification errors. Experimental results show that our method could be suitable for practical use. We obtained an F1-score of 90.23% in the first task, an accuracy of 81.67% in the second task, and an overall accuracy of 88.59% for the integrated system with both tasks. 1 Introduction In many areas, comparisons are very important during decision making. For example, politicians may change their political strategies after monitoring how their policies compare with those of their competitors. Manufacturers can also change their marketing strategies after comparing their products with those of their competitors. A similar situation also applies to customers. If a customer is deciding whether to buy Car-A or Car-B, he/she will probably access the Web and type these two items into the search box. A search engine such as Google will then find relevant documents. Next, the customer will open and read each retrieved document until he/she obtains enough information. The customer " s decision may be dominated by sentences that compare these two items. It is clear that obtaining information from the Web is a good and easy solution. However, it is also clear that reading many documents until sufficient information has been acquired is still a time-consuming task. If the customer only has access to a small amount of data, he/she may form biased views. By contrast, reading large amounts of data demands an enormous amount of time and effort. Therefore, it would be very useful in many …

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extracting Comparative Entities and Predicates from Texts Using Comparative Type Classification

The automatic extraction of comparative information is an important text mining problem and an area of increasing interest. In this paper, we study how to build a Korean comparison mining system. Our work is composed of two consecutive tasks: 1) classifying comparative sentences into different types and 2) mining comparative entities and predicates. We perform various experiments to find releva...

متن کامل

Parsing Korean Comparative Constructions in a Typed-Feature Structure Grammar

Jong-Bok Kim, Jaehyung Yang, and Sanghoun Song. 2010. Parsing Korean Comparative Constructions in a Typed-Feature Structure Grammar. Language and Information 14.1 , 1–24. The complexity of comparative constructions in each language has given challenges to both theoretical and computational analyses. This paper first identifies types of comparative constructions in Korean and discusses their mai...

متن کامل

The Improvement of Negative Sentences Translation in English-to-Korean Machine Translation

This paper describes the algorithm for translating English negative sentences into Korean in English-Korean Machine Translation (EKMT). The proposed algorithm is based on the comparative study of English and Korean negative sentences. The earlier translation software cannot translate English negative sentences into accurate Korean equivalents. We established a new algorithm for the negative sen...

متن کامل

Extracting Comparative Sentences from Korean Text Documents Using Comparative Lexical Patterns and Machine Learning Techniques

This paper proposes how to automatically identify Korean comparative sentences from text documents. This paper first investigates many comparative sentences referring to previous studies and then defines a set of comparative keywords from them. A sentence which contains one or more elements of the keyword set is called a comparative-sentence candidate. Finally, we use machine learning technique...

متن کامل

Finding relevant features for Korean comparative sentence extraction

In this paper, we study how to extract comparative sentences from Korean text documents. We decompose our task into three steps: 1) collecting comparative keywords; 2) extracting comparative-sentence candidates by keyword searching; 3) eliminating non-comparative sentences from these candidates using machine learning techniques. We perform various experiments to find relevant features. As a res...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Natural Language Engineering

دوره 20  شماره 

صفحات  -

تاریخ انتشار 2014